Skip to content

3797: fix: query tolerance= in SQL file tests now also asserts Comet native execution#50

Open
martin-augment wants to merge 4 commits intomainfrom
pr-3797-2026-03-29-11-27-45
Open

3797: fix: query tolerance= in SQL file tests now also asserts Comet native execution#50
martin-augment wants to merge 4 commits intomainfrom
pr-3797-2026-03-29-11-27-45

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

3797: To review by AI

… execution

The WithTolerance mode in CometSqlFileTestSuite called
checkSparkAnswerWithTolerance, which hardcodes assertCometNative=false.
This meant that all 50+ SQL test queries using 'query tolerance=...'
(sin, cos, tan, stddev, avg, etc.) silently skipped the check that
Comet actually executed the expression natively.

Add checkSparkAnswerAndOperatorWithTolerance that combines tolerance-
based result comparison with the native operator assertion, and use it
in the WithTolerance case.
Maps Spark's Logarithm expression to DataFusion's log(base, value) function.
Applies nullIfNegative to both base and value to match Spark's behavior of
returning null when the result would be NaN (inputs <= 0).
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e767b708-b309-46c1-b782-a1f967cf20ec

📥 Commits

Reviewing files that changed from the base of the PR and between c3aadf9 and c9b53d7.

📒 Files selected for processing (5)
  • spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
  • spark/src/main/scala/org/apache/comet/serde/math.scala
  • spark/src/test/resources/sql-tests/expressions/math/tan.sql
  • spark/src/test/scala/org/apache/comet/CometSqlFileTestSuite.scala
  • spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala

Walkthrough

This pull request adds support for serializing Spark's Logarithm expression in Comet. It introduces a CometLogarithm serde handler that converts Logarithm expressions to log scalar functions with base and value operands, applying null-safety checks. The changes include registering the new handler in the expression mapping, updating imports, and enhancing the test infrastructure to validate both query results and operator-level execution with floating-point tolerance settings. A configuration option for Tan expression testing is also added.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr-3797-2026-03-29-11-27-45

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the Logarithm expression in Comet, adding it to the query plan serialization and providing a specific implementation in the math serde. It also updates the test suite to include checkSparkAnswerAndOperatorWithTolerance, which ensures that Comet native operators are used when verifying results with a tolerance. A review comment points out that the current Logarithm implementation fails to handle the case where the base is 1, which should return null to maintain compatibility with Spark's behavior.

Comment on lines +146 to +149
// Spark's Logarithm(left=base, right=value) returns null when result is NaN,
// which happens when base <= 0 or value <= 0. Apply nullIfNegative to both.
val leftExpr = exprToProtoInternal(nullIfNegative(expr.left), inputs, binding)
val rightExpr = exprToProtoInternal(nullIfNegative(expr.right), inputs, binding)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of CometLogarithm doesn't correctly handle the case where the logarithm base is 1. Spark's Logarithm expression returns null when the base is 1. This implementation is missing that check, which can lead to incorrect results (e.g., Infinity or NaN) from the native kernel instead of null.

To align with Spark's behavior, you should add a check to return null if expr.left is 1. I've also updated the comment to reflect this.

Suggested change
// Spark's Logarithm(left=base, right=value) returns null when result is NaN,
// which happens when base <= 0 or value <= 0. Apply nullIfNegative to both.
val leftExpr = exprToProtoInternal(nullIfNegative(expr.left), inputs, binding)
val rightExpr = exprToProtoInternal(nullIfNegative(expr.right), inputs, binding)
// Spark's Logarithm(left=base, right=value) returns null when result is NaN,
// which happens when base <= 0, base = 1, or value <= 0.
val baseExpr = If(expr.left === Literal(1.0), Literal(null, expr.left.dataType), nullIfNegative(expr.left))
val leftExpr = exprToProtoInternal(baseExpr, inputs, binding)
val rightExpr = exprToProtoInternal(nullIfNegative(expr.right), inputs, binding)

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback: The Gemini AI reviewer is not correct!

spark-sql (default)> SELECT log(1, 5);
Infinity
Time taken: 0.087 seconds, Fetched 1 row(s)

It returns Infinity, not Null when the base is 1.

@claude
Copy link
Copy Markdown

claude bot commented Mar 29, 2026

This PR does two things: it adds native Comet execution for Spark's 2-arg Logarithm expression (log(base, value)), and it fixes CometSqlFileTestSuite so that query tolerance=X queries now properly assert Comet native execution (previously they only verified correctness without checking that Comet actually ran natively). The tan.sql change is a necessary follow-on from the test fix.

The test framework fix

This is a good catch. Before this PR, WithTolerance called checkSparkAnswerWithTolerance which passes assertCometNative = false. That means tolerance queries were silently passing even when Comet was falling back to Spark, defeating the whole point of having them in the SQL file test suite. The fix to call checkSparkAnswerAndOperatorWithTolerance is correct.

The tan.sql update (adding spark.comet.expression.Tan.allowIncompatible=true) makes sense because Tan is marked Incompatible and now needs the flag to pass the stricter assertion. Other expressions using query tolerance= that are marked Incompatible (like corr.sql) already had allowIncompatible configured. Variance, stddev, avg, and sum are all Compatible() so they should be unaffected.

The CometLogarithm implementation

The implementation follows the established pattern for log functions in this file, using nullIfNegative on both arguments. The general comment at line 104-106 documents this pattern well: non positive values eval to null instead of NaN or -Infinity.

One small note on the inline comment in CometLogarithm: it says returns null when result is NaN, which happens when base <= 0 or value <= 0. For strictly negative inputs this is accurate (since Math.log(-x) = NaN), but for zero specifically Spark actually returns -Infinity or -0.0 rather than NaN. So the result is not NaN for zero inputs. Comet returns null for them anyway, consistent with the nullIfNegative approach used by CometLog, CometLog2, and CometLog10. The comment would be clearer if it simply mirrored the general comment above CometLog rather than inventing a new justification.

Missing test coverage

The new CometLogarithm implementation has no dedicated SQL test file. The existing log.sql does test the 2-arg form, but only with positive values in the test data. It might be worth expanding log.sql (or adding a logarithm.sql) to cover:

  • A negative base: log(-2.0, 4.0) should return null in both Spark and Comet
  • A negative value: log(2.0, -4.0) should return null in both
  • Base = 1: log(1.0, 10.0) where Spark returns Infinity; worth verifying Comet agrees
  • Both args as columns (not just column + literal combinations)

Without these, the query tolerance= tests in log.sql will now assert native execution of CometLogarithm but only for the happy-path values already there.

Minor

checkSparkAnswerAndOperatorWithTolerance in CometTestBase.scala is a clean addition. One observation: the existing checkSparkAnswerAndOperatorWithTol method (line 252) appears to serve the same purpose for DataFrame inputs. It might be worth checking whether the two can be unified, but this is not blocking.

Overall this is a solid, consistent change. The main thing worth addressing before merge is the missing edge-case coverage for Logarithm, particularly negative and zero inputs.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Mar 29, 2026

🤖 Augment PR Summary

Summary: This PR tightens SQL-file test expectations for tolerance-based queries so they also verify Comet native execution.

Changes:

  • Adds native serde support for Spark’s Logarithm expression (2-arg log(base, value))
  • Updates SQL file test runner so tolerance=... queries assert both answer correctness and native operator coverage
  • Adds a new test helper to combine tolerance comparisons with native-coverage assertions
  • Updates the tan.sql test to enable an explicitly incompatible expression via config

Technical Notes: This makes tolerance-based SQL tests stricter by requiring Comet-native plans (unless explicitly configured to allow incompatible behavior).

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestion posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

expr: Logarithm,
inputs: Seq[Attribute],
binding: Boolean): Option[ExprOuterClass.Expr] = {
// Spark's Logarithm(left=base, right=value) returns null when result is NaN,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says Spark returns null when the result is NaN, but for log functions Spark/Hive semantics are generally “non-positive inputs => null” (e.g., log(..., 0) would otherwise yield -Infinity, not NaN). This wording could mislead future changes to the null-guard logic.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

@martin-augment martin-augment Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:documentation; feedback: The Augment AI reviewer is correct! The comment says that NaN results are converted to NULL but value=0 leads to an -Infinity result, which is also treated as NULL by Apache Spark. Prevents committing a confusing comment.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

val optExpr = scalarFunctionExprToProto("log", leftExpr, rightExpr)
optExprWithInfo(optExpr, expr, expr.left, expr.right)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CometLogarithm mishandles NaN inputs producing NaN instead of null

High Severity

CometLogarithm uses nullIfNegative on both inputs, which only converts values <= 0 to null. But Spark's Logarithm applies a nullCheck on the result, returning null whenever the output isNaN. This means NaN inputs (where NaN <= 0 is false) pass through nullIfNegative unchanged, and DataFusion then produces NaN where Spark would produce null. The existing log.sql test includes NaN in its test data (cast('NaN' as double)), so queries like log(10.0, NaN) will return NaN from Comet but null from Spark, causing a test failure and incorrect production results. The same mismatch occurs for log(1, 1) where both inputs are valid but the result is 0/0 = NaN.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:annoying; category:bug; feedback: The Bugbot AI reviewer is not correct!

spark-sql (default)> SELECT log(1, 1);
NaN
Time taken: 0.072 seconds, Fetched 1 row(s)

Apache Spark 4.0.2 returns NaN here, not NULL.

@martin-augment
Copy link
Copy Markdown
Owner Author

checkSparkAnswerAndOperatorWithTolerance in CometTestBase.scala is a clean addition. One observation: the existing checkSparkAnswerAndOperatorWithTol method (line 252) appears to serve the same purpose for DataFrame inputs. It might be worth checking whether the two can be unified, but this is not blocking.

value:good-to-have; category:bug; feedback: There are similar methods that accept DataFrame instead of SQL query string that could be reused instead by delegating to them. This way both the SQL and DataFrame based APIs will pass the same extra arguments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants